GRASS: the Graz corpus of Read And Spontaneous Speech

نویسندگان

  • Barbara Schuppler
  • Martin Hagmüller
  • Juan Andres Morales-Cordovilla
  • Hannes Pessentheiner
چکیده

This paper provides a description of the preparation, the speakers, the recordings, and the creation of the orthographic transcriptions of the first large scale speech database for Austrian German. It contains approximately 1900 minutes of (read and spontaneous) speech produced by 38 speakers. The corpus consists of three components. First, the Conversation Speech (CS) component contains free conversations of one hour length between friends, colleagues, couples, or family members. Second, the Commands Component (CC) contains commands and keywords which were either read or elicited by pictures. Third, the Read Speech (RS) component contains phonetically balanced sentences and digits. The speech of all components has been recorded at super-wideband quality in a soundproof recording-studio with head-mounted microphones, large-diaphragm microphones, a laryngograph, and with a video camera. The orthographic transcriptions, which have been created and subsequently corrected manually, contain approximately 290 000 word tokens from 15 000 different word types.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Pronunciation variant analysis using speaking style parallel corpus

To improve the recognition accuracy for spontaneous conversational speech, we collected a corpus to study how spontaneous conversational speech differs from read style speech. The corpus consists of two parts: 1) spontaneous conversational speech and 2) read speech with the same word transcriptions as the conversational speech. In word and phone recognition experiments, it was confirmed that, f...

متن کامل

Segmentation Cues in Spontaneous and Read Speech

Segmentation research asks how listeners locate word boundaries in the ongoing speech stream. Previous work has identified multiple cues (lexical, segmental, prosodic) which affect perception of boundary placement, but such studies have almost exclusively used careful read speech, rather than speech elicited in a natural communicative context. We report development of a segmentation-oriented co...

متن کامل

Common and Language Dependent Phonetic Differences Between Read and Spontaneous Speech in Russian, Finnish and Dutch

This preliminary study aims to reveal both common and language-specific phonetic differences between read and spontaneous speech in three typologically unrelated languages – Russian, Finnish, and Dutch. These languages differ in prosody, sound systems, speech styles, and means for conveying intonational meaning. Spontaneous speech was recorded from 5 to 8 speakers in each language. Transliterat...

متن کامل

Modeling prosody for language identification on read and spontaneous speech

This paper deals with an approach to Automatic Language Identification using only prosodic modeling. The actual approach for language identification focuses mainly on phonotactics because it gives the best results. We propose here to evaluate the relevance of prosodic information for language identification with read studio recording (previous experiment [1]) and spontaneous telephone speech. F...

متن کامل

The Phonetic Labeling on Read and Spontaneous Discourse Corpora

Read and spontaneous discourses are two different but very significant speech styles to be investigated. So phonetic labeling on read and spontaneous discourse corpora are made one is ASCCD, a 10 hours read discourse corpus and the other is CASS, a 4 hours spontaneous discourse corpus. First the principles and conventions of transcription are presented. Then, these two speech styles are compare...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014